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The development of next-generation sequencing technologies 
hasenabled rapid and cost effective wholegenomesequencing. 
This technology has allowed researchers to shortcut time- 
consuming and laborious methods used to identify nucleotide 
mutations in forward genetic screens in model organisms. 
However, causal mutations must still be mapped to a region 
of the genome so as to aid in their identification. This can 
be achieved simultaneously with deep sequencing through 
various methods. Here we discuss alternative deep sequencing 
strategies for simultaneously mapping and identifying causal 
mutations in Caenorhabditis elegans from mutagenesis screens. 
Focusing on practical considerations, such as the particular 
mutant phenotype obtained, this review aims to aid the reader 
in choosing which strategy to adopt to successfully clone their 
mutant. 



Introduction 

Forward genetics is a powerful means to identify the genes and 
molecular mechanisms that determine a phenotype. A chemical 
mutagenesis screen approach, usually employing an agent such 
as ethyl methanesulfonate (EMS), 1 has certain advantages, such 
as its unbiased nature in pinpointing DNA elements that are 
important for a certain phenotype. A mutagenesis screen will also 
produce valuable mutants, typically stable and often in multiple 
distinct alleles, which can be utilized as a resource for further 
characterization of the gene. Indeed, the localization and amino 
acid changes induced by mutagenesis can itself help to illumi- 
nate the role of the gene if, for instance, a functional domain of 
the corresponding protein is perturbed. Until recently, the main 
disadvantage of conducting mutagenesis screens was the relative 
difficulty and time involved in identifying the gene affected by 
the causal mutation. Although this bottleneck is avoided when 
resorting to large-scale RNA interference (RNAi) screens, 2 no 

Correspondence to: Sophie Jariault; Email: sophie@igbmc.fr 
Submitted: 04/05/13; Revised: 05/16/13; Accepted: 05/17/13 
Citation: http://dx.doi.org/10.4161/worm.25081 



mutant alleles are obtained, a disadvantage when further analyz- 
ing gene function. 

Traditionally, in order to identify the causal mutation, single 
nucleotide polymorphism (SNP) mapping is employed to narrow 
down a genomic region harboring the DNA lesion. 3 " 5 Typically, 
a cross is set up between mutant isolates and an alternative strain 
of the same species that contains polymorphic nucleotides. A 
quarter of the resulting second filial generation (F 2 ) progeny 
from an F, cross or self-cross (depending on the species being 
studied) will inherit the causal mutation in homozygous form. 
Unless the mutation is either dominant, or a parental RNA or 
protein contribution from a heterozygote parent is sufficient to 
rescue mutant offspring, these F 2 mutant progeny will display the 
mutant phenotype and can be selected. As a result of Mendelian 
inheritance of chromatids, as well as mostly random chromo- 
somal recombination during meiosis, the polymorphisms from 
the parental and mapping strains will be distributed in a roughly 
50/50 ratio in the selected F,, except for those SNPs that are 
genetically linked to the causal mutation. These linked polymor- 
phisms, which are within relatively close physical proximity to 
the causal mutation, are statistically less likely to be included in 
a chromosomal recombination event compared with those fur- 
ther away on the chromosome. Thus, the location of the causal 
mutation is betrayed by a stretch of the genome marked by a dis- 
proportionately high frequency of parental polymorphisms and 
an underrepresentation of mapping strain polymorphisms. This 
fundamental rule of chromosomal recombination is the basis of 
genetic mapping. 

Over the past 5 y, methods based on next-generation sequenc- 
ing (NGS) technology 6 have allowed advances in several research 
fields leading to the identification of the mutational processes 
involved in various types of cancers and genetic diseases in non- 
genetic model organisms such as humans; 7,8 or to obtain insights 
into phylogenetics and evolutionary processes. 9 Here, we will 
highlight how this technology has aided in the identification of 
mutations from forward genetic screens, with a focus on C. ele- 
gans. Two general approaches to identify causal mutations using 
NGS have been employed. In a first approach, the causal muta- 
tion is mapped to a genomic region using traditional SNP-based 
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approaches prior to sequencing the whole genome, 10 which is used 
to detect candidate nucleotide mutations within this interval. 
The second approach involves mapping as well as identifying the 
causal mutation simultaneously by whole genome sequencing. In 
this technical-based review, we focus on the alternative strategies 
developed for this second approach whereby NGS is employed 
to simultaneously pinpoint chromosomal recombination events 
(mapping) and detect candidate mutations in this mapped region. 
These powerful techniques have dramatically sped up a process 
that was traditionally achieved through detection of nucleotide 
variants one at a time, usually with the use of restriction endo- 
nucleases. It may be intuitive to ask that if the entire genome 
of a mutant is sequenced, why would there be a need to map a 
mutation to a specific region of the genome? Deep sequencing 
of a C. elegans genome will reveal the presence of many nucleo- 
tide variants that exist in the backgrounds of different strains 
used across the C. elegans community. 11 " 15 New nucleotide vari- 
ants, including those within genes, also appear relatively rapidly 
in strains kept in the laboratory. 16 These widespread variants 
severely hamper the identification of the true causal mutation (s), 
thus making the need for mapping essential. Indeed, it is not only 
in C. elegans that background nucleotide variants cause problems 
in causal mutation identification. This phenomenon is conserved 
across plants and animals, leading to the establishment of differ- 
ent strategies and methodological tools to simultaneously map 
and deep sequence mutants of a wide variety of species includ- 
ing Zebrafish, 17 Arabidopsis, 18 Rice," mouse 20 and of course 
C. elegans. 12,21 

SNP-Based Deep Sequence Mapping 

First proposed by Lister and colleagues and then tested in 
Arabidopsis tbaliana, the SNP-based method of simultaneous 
mapping and deep sequencing was successful in a proof-of-prin- 
ciple experiment to identify a recessive allele that caused slow 
growth and light green leaves. 18,22 The principle of the method is 
the same as classical SNP-based mapping. The difference being 
that instead of performing PCR, restriction digestion and agarose 
gel electrophoresis to identify polymorphism frequencies one at 
a time, deep sequencing is used to identify most of them at once. 

To create a mapping population, the mutant strain-pro- 
duced in a Columbia background-was crossed to the polymor- 
phic Landsberg erecta strain. A single genomic DNA sample was 
prepared from a pool of 500 mutant F, plants and an Illumina 
library was prepared and sequenced to 22-fold genome cover- 
age. Using this method, the authors observed a stretch of DNA 
devoid of Landsberg polymorphisms on chromosome 4, which 
contained a serine to asparagine nonsynonymous codon change 
in the A T4G35090 gene. 

This same technique has been adapted to C. elegans using the 
N2 Bristol parental strain and Hawaiian CB4856 23 as a mapping 
strain. 21 Essentially, the principle and methodology (Fig. 1A) are 
similar to that used for Arabidopsis. The mutant to be mapped 
(in an N2 background) is crossed to the CB4856 strain and F 2 
recombinant progeny that display the mutant phenotype are 
singled onto fresh plates and grown for a couple of generations 



(Fig. 2). These independent populations are then pooled and 
DNA is extracted and sequenced. In a proof-of-principle experi- 
ment, Doitsidou et al. mapped the location of a mutant defec- 
tive in dopaminergic neuron specification to the X chromosome. 
Encouragingly, a far lesser number of F 2 recombinant progeny 
were used here than the 500 previously used for Arabidopsis. 
When 50 F 2 recombinants were selected, grown and pooled, a 
mapping interval of 2.1 megabases (Mb) was defined on chro- 
mosome X. 21 When 20 F recombinants were used, a mapping 
interval of 4.9 Mb was defined on chromosome X. Within these 
mapped regions, a premature stop codon was found to be present 
in the vab-3 gene, disruption of which was found to control the 
generation of dopaminergic neurons. 

A similar SNP-based mapping approach using deep sequenc- 
ing was used by O'Rourke et al. (2011) to identify several mutant 
alleles in C. elegans. However, instead of sequencing total genomic 
DNA from pooled recombinant F 2 of a cross between the N2 
mutants and CB4856, deep sequencing was performed only on 
selectively amplified genomic fragments that were in close prox- 
imity to an EcoRI restriction site. SNPs within this restriction 
site-associated DNA (RAD) were detected and used to map a 
region of low-density CB4856 polymorphisms in the genome. 
This RAD mapping approach, adapted from Drosophila and other 
species, 24,25 allows for a higher throughput of mutant mapping as 
sequencing volume is reduced. As only a fraction of the genome 
is sequenced, it does not, however, allow simultaneous identifica- 
tion of the causal mutation. Instead, the authors returned to the 
mutant and pulled down the genomic interval corresponding to 
the mapped region by annealing sheared mutant genomic DNA 
to corresponding biotinylated fosmid fragments attached to mag- 
netic beads. 26 These mutant fragments were then deep sequenced 
to identify candidate causal mutations within. 

EMS-Based Deep Sequence Mapping 

EMS-based mapping works on a similar yet different principle 
where instead of using polymorphisms between two different 
strains to detect a genomic region linked to the causal muta- 
tion, the nucleotide changes caused by the EMS mutagenesis 
itself are used (Figs. IB and 3). The approach was first proposed 
and demonstrated in a proof-of-principle experiment in C. ele- 
gans where several independent mutations that cause a defect 
in a transdifferentiation event, where the rectal epithelial Y cell 
converts into the motoneuron PDA cell, 27 " 2 ' were mapped to dif- 
ferent regions on chromosomes III and X. 12 One of these, the 
previously uncharacterised mutant strain carrying the fp6 allele 
was confirmed to be caused by a missense mutation in the hox 
gene egl-5, situated within the middle of the mapped region on 
chromosome III. 12 

The principle of the method is simple. During mutagen- 
esis treatment, EMS randomly distributes mutations across the 
genome. EMS has a strong bias for inducing Guanine (G) to 
Adenosine (A) nucleotide transitions. 14,30,31 The reciprocal strand 
of DNA presents this mutation as a Cytosine (C) to Thymine 
(T) substitution. The majority of these EMS-induced nucleotide 
changes are removed from the background by back/outcrossing 
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Figure 1. Principle of each of the simultaneous mapping and mutant identification methods used in C. elegans. (A) SNP-based strategy requiring 
crossing of the mutant (N2 background) to CB4856, pooling of multiple isolated F 2 recombinants that display the phenotype and deep sequencing 
to discover a genomic region where only N2 SNPs reside. This region should contain the causal mutation (green star). Adapted from reference 34, (B) 
EMS-based strategy whereby the mutant is backcrossed multiple times until EMS-induced mutations (white stars) are cleared, except for those geneti- 
cally linked to the causal mutation (green star). The cluster of EMS-induced GA > CT nucleotide changes is detected by deep sequencing and localized 
to a genomic region that should contain the causal mutation. (C) A bulk-segregant approach to the EMS-based mapping strategy that involves a single 
backcross followed by pooling of F 2 recombinants and deep sequencing to identify a genomic region where EMS induced GA > CT are exclusive (white 
stars). In this region, the causal mutation (green star) should be located. 
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Figure 2. Flowchart of SNP-based simultaneous map- 
ping and mutant identification using deep sequencing. 
This strategy is similar to traditional SNP mapping tech- 
niques performed before whole genome sequencing 
was feasible. However, instead of analyzing the linkage 
of SNPs to a causal mutation one at a time through 
laborious identification procedures, deep sequencing 
is used to interrogate most of them simultaneously (see 
ref. 21). Twenty to 50 F 2 progeny displaying the mutant 
phenotype are picked from a cross between the mutant 
(usually in a Bristol N2 background) and a polymorphic 
strain such as CB4856. They are then singled onto fresh 
plates, grown and pooled before DNA is extracted and 
sequenced. From the deep sequencing data and with 
the use of available software, 34 a genomic region linked 
to the causal mutation will be identified and nucleotide 
mutations within this region will also be revealed. A list 
of candidate causal mutations that affect genes within 
this region will also be assembled. The original mutant 
is then subjected to multiple rounds of backcrossing 
or outcrossing (> six simple backcrosses or equivalent 
recommended) to remove all other EMS-induced muta- 
tions from its background and then tested via several 
methods (e.g., complementation analysis and transgenic 
rescue) to determine which candidate mutation is caus- 
ing the phenotype. 



of the mutant to a wild-type strain (for instance the pre-muta- 
genised strain, or the N2 reference strain). Several (three to six) 
rounds of backcrossing ensure that enough recombination events 
between the mutant background and the un-mutated background 
have occurred so that a distinct "hot spot" of EMS-induced DNA 
damage can be observed when visualizing the genome sequence 
(Fig. IB). Thus, these genetically linked EMS-induced nucleo- 
tide changes remain present after backcrossing and can be eas- 
ily observed as G-to-A or C-to-T variants in a genome sequence. 



We have tested and confirmed that outcrossing to 
the pre-mutagenized strain or an alternative strain 
is equally effective at detecting this EMS damage 
hot spot. 

Since EMS damage rather than SNPs are used 
to map the location of the causal mutation, this 
method negates the need to create a mapping popu- 
lation between intra-species polymorphic strains. 
Therefore, the method is theoretically open to 
any species where such intra-species strains are not 
available or characterized. The only requirements 
are that the species is susceptible to mutagenesis by 
either EMS or Af-ethyk/V-nitrosourea (ENU), 14 an 
alternative chemical mutagen that also produces 
canonical nucleotide changes that can be tracked, 
and that the species can also be back/outcrossed. 

Limiting Factors to Mapping Resolution 



Despite the fact that the frequency of EMS-induced 
nucleotide changes during a typical mutagenesis 
experiment is lower when compared with the fre- 
quency of SNPs between N2 and Hawaiian strains, 
the mapping resolution of each method is limited 
by chromosomal recombination events instead of polymorphism 
frequency. In our experience, adult worms exposed to a standard 1 
50 mM dose of EMS over 4 h were found to contain one canoni- 
cal EMS mutation (G > A or C > T) in every 125,000-143,000 
base pairs. 12 Polymorphic nucleotide differences between N2 and 
CB4856 are distributed across the genome at an average approxi- 
mate frequency of one in every 1,000 base pairs. 13 Mapping 
using either SNP- or EMS-based approaches is usually resolved 
to a genomic region with a width in the millions of base pairs, 
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a limitation resulting from recombination frequency 
on the chromosome carrying the causal mutation. 

In C. elegans, chromosomal recombination events 
tend to be restricted to one per homologous pair in 
most meiosis. 32 Four backcrosses would thus typically 
result in four recombination events per chromosome, 
and six backcrosses would result in six recombination 
events per chromosome, and so on. In our experience, 
three to six backcrosses are sufficient to effectively 
strip chromosomes clean of EMS-exposed DNA. 
On the chromosome containing the causal muta- 
tion, a variable width (but usually around 5 Mb) of 
chromosomal DNA will still contain EMS damage 
that is linked to the causal mutation. This variabil- 
ity is due to the random chance that at least one out 
of three to six recombination events (if three to six 
backcrosses or outcrosses are performed) will occur 
within a certain distance of the causal mutation, the 
probability of which decreases the closer the distance 
is to the mutation. It also depends upon the location 
of the causal mutation in relation to recombination 
hotspots, genomic intervals in which crossover events 
occur at a much higher frequency. 33 Obviously, more 
backcrosses will increase the chance of a recombina- 
tion event occurring closer to the causal mutation, 
thus increasing mapping resolution. 

To further increase the number of recombination 
events that can be observed by deep sequencing, mul- 
tiple recombinants can be analyzed together. Each 
recombinant obtained from a cross has undergone 
independent recombination events and, thus, ana- 
lyzing several recombinants instead of one recombi- 
nant increases the resolution of mapping simply by 
increasing the number of chances that a recombina- 
tion event has occurred more closely to the causal 
mutation. This is why SNP-based mapping proto- 
cols pool genomic DNA from multiple recombinant 
F, progeny before deep sequencing (see above, and 
Fig. 1A). In fact, this same bulk segregant approach 
can also be performed when using EMS-based map- 
ping (Fig. 1C), as has recently been shown in both 
plants (rice) and C. elegans. 19,34 In C. elegans, it was found that 
this approach resulted in a mapping resolution (-1 Mb) very 
similar to that of SNP-based mapping. 34 In this example, link- 
age of background nucleotide variants present in the strain as a 
consequence of genetic drift were also followed to complement 
that of mutagen-induced nucleotide variants (variant density 
mapping). 34 

Thus, at the last round of backcrossing, which might be 
reduced to as little as one round if desired, multiple recombinant 
F 2 progeny can be selected, grown and pooled for DNA extrac- 
tion (Fig. 4), much as in the manner as that performed for SNP- 
based mapping. 21 An added benefit of this approach is that at 
least one round of backcrossing has already been performed prior 
to mapping and sequencing, which will save time later, when 
more backcrosses are required prior to mutant characterization. 
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Figure 3. Flowchart of EMS-based simultaneous mapping and mutant identification 
using deep sequencing. This strategy for identifying a causal mutation traces the 
distinctive changes in nucleotides caused by EMS itself to map the location of the 
mutation. Several rounds of backcrossing remove EMS-induced nucleotide changes 
that are un-linked to the causal mutation, while leaving a "hot spot" of EMS damage 
surrounding the causal mutation (see ref. 12). Only a single recombinant is needed at 
the end of backcrossing to grow and extract DNA for deep sequencing. This is also ad- 
vantageous when working with a mutant with a low penetrance or subtle phenotype. 
Deep sequencing data and analysis with available software 34 will reveal a genomic 
region linked to the causal mutation. Nucleotide mutations within this region will 
also be revealed and a list of candidate causal mutations that affect genes within 
this region will be assembled. The mutant can then be analyzed directly (since it has 
already been backcrossed) via several methods (e.g., complementation analysis and 
transgenic rescue) to determine which candidate mutation is causing the phenotype. 



Advantages and Disadvantages of Each Strategy 

Below we compare features of the SNP-based and EMS-based 
methods for simultaneously mapping and sequencing mutations, 
which are summarized in Table 1 and can be used as a basis when 
deciding which strategy to implement. Table 2 summarizes the 
considerations and choices along the way. 

Selecting multiple vs. a single recombinant F 2 . One scenario 
where it can be difficult to pool multiple recombinant progeny 
is when the mutant has a subtle phenotype, or has a very low 
penetrance or a combination of both. In our experience, with a 
mutant of around 5% penetrance and a subtle single-cell level 
phenotype, we found that it was very convenient that the isola- 
tion of only one recombinant was sufficient for mapping using 
the EMS-based approach (Fig. 3). This particular mutant was 
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Figure 4. Flowchart of a bulk segregation approach to 
EMS-based mapping using deep sequencing. Only a sin- 
gle backcross would be necessary to perform EMS-based 
mapping if multiple recombinant F 2 mutant progeny were 
singled, grown and pooled for deep sequencing. This 
approach has the added advantage of saving time later 
by reducing the number of backcrosses needed before 
mutant analysis as well as avoiding a CB4856 cross, which 
may preclude some mutant phenotypes. More than one 
backcross can be performed prior to selection of 20-50 
recombinants, which would further increase mapping 
accuracy. Again, deep sequencing data and analysis with 
available software 34 will reveal a genomic region linked 
to the causal mutation. Nucleotide mutations within this 
region will also be revealed and a list of candidate causal 
mutations that affect genes within this region will be as- 
sembled. The mutant is then backcrossed, although a less 
number of times since it has already undergone a round 
of backcrossing, and then tested via several methods 
(e.g., complementation analysis and transgenic rescue) 
to determine which candidate mutation is causing the 
phenotype. 



mapped to a section of chromosome V, where a candidate muta- 
tion within was confirmed to be causal (SZ and SJ, data not 
shown). This is a distinct advantage of EMS-based mapping 
when compared with SNP-based mapping and may also be useful 
when examining behavioral phenotypes. In such circumstances, 
recombinant progeny may need to be selected through complex 
analyses of behavior, which may need to be preceded by ampli- 
fication of animal numbers. This is a challenging task that is 
further complicated by mutations with incomplete penetrance. 
Thus, proceeding through backcrossing, which is necessary any- 
way, and selecting a single recombinant at each step (Fig. 3) may 



be a more convenient strategy to map and sequence 
such behavioral mutants. 

The same can also be said for mapping mutants 
from modifier genetic screens, where a second muta- 
tion (enhancer or suppressor) needs to be maintained 
together with the primary mutation. Indeed, a useful 
strategy would be to use the strain containing the pri- 
mary mutation for backcrossing, thus ensuring that 
this mutation is maintained as homozygous through- 
out. This method would simplify identification of F 2 
recombinants containing both mutations, which can 
be subsequently used for either single recombinant or 
bulk segregant EMS-based mapping (Figs. 3 and 4). 
This strategy cannot be achieved when outcrossing to 
CB4856 to create a mapping population. 

Complications arising from background nucleo- 
tide variants. Another key difference between the 
two methods is that SNP-based mapping requires 
the introduction of variants (CB4856 SNPs) into the 
background, whereas EMS-based mapping requires 
the removal of variants (caused by EMS) from the 
background. For some phenotypes, it is possible that 
altering the genetic background via the introduction 
of CB4856 polymorphisms may modify the pheno- 
type. As such, it would not be possible to create a 
mapping population by crossing to CB4856. 

Comparisons between mutant sequences. Discerning 
between false nucleotide variants that may have been called as 
a result from comparison to the N2 reference genome (which 
contains rare sequencing errors, -1/100,000 nucleotides 13 " 15 ), 
background variants specific to the strain used in the muta- 
genesis screen and nucleotide changes induced by EMS is a key 
aspect of successfully identifying bona fide candidate mutations. 
Even within a small genomic region mapped to a few million 
base pairs, there are still many background variants that hamper 
the selection of candidate mutations. Sequencing and directly 
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Table 1. Advantages and disadvantages of SNP-based and EMS-based mapping strategies 



Method 

SNP-based approach 



EMS-based 
approach 



Non-bulk 
segregant 
approach 



Bulk- 
segregant 
approach 



Advantage 

• Out/backcrossing not necessary for mapping mutation 

• CB4856 strain readily available for mapping population 

• Mapping population can be used for rough mapping prior 

to deep sequencing 

• Crossing step quicker than multiple rounds of out/back- 

crossing 

• Introduced CB4856 polymorphisms may modify some phe- 

notypes (potentially interesting) 

• No need to create a mapping population 

• Only one animal needs to be recovered for each backcross 

step and for sequencing 

• Out/backcrossing is inherent and therefore does not need 

to be performed later 

• Comparison between multiple mutant isogenic sequences 

possible 

• Crossing to the same strain won't preclude phenotype 

• Useful for cloning modifier mutations 

• Mapping resolution greater than non-bulk segregant 

approach 

• Crossing step quicker than multiple rounds of out/back- 

crossing 

■ Reduced number of out/backcrosses to be performed later 

• Comparison between multiple mutant isogenic sequences 

possible 

• Crossing to the same strain won't preclude phenotype 

• Useful for cloning modifier mutations 



Disadvantage 

• Need to create a mapping population (20-50 F 2 ) 

• Still requires out/backcrossing after candidate muta- 

tions identified 

• Need to sequence background strain separately as 
comparison directly to other mutant sequences hin- 
dered by recombinant CB4856 genome 

• Introduced CB4856 polymorphisms may preclude 

some phenotypes 

• Necessary out/backcrossing takes a longer (horizontal) 
time 

• EMS-induced nucleotide changes less frequent than 
polymorphisms (although not limiting factor to map- 
ping resolution) 

• Mapping resolution more variable due to lower num- 

ber of recombination events 

• Possibility of low EMS mutation density surrounding 
causal mutation, which may reduce mapping resolution 

• Need to create a mapping population (20-50 F ) 

• EMS-induced nucleotide changes less frequent than 
polymorphisms (although not limiting factor to map- 
ping resolution) 

• Still requires more out/backcrossing after candidate 

mutations identified 

• Possibility of low EMS mutation density surrounding 
causal mutation, which may reduce mapping resolution 



Table 2. Step by step practical considerations and choices when implementing deep sequencing strategies for mapping and identifying mutations 



Choose a mapping strategy 


Impinges on whether additional genetic tests will be necessary (see selecting multiple vs. a 
single recombinant F2, complications arising from background nucleotide variants and time 

involved) 


Choose how many alleles will be sequenced in parallel 


(see comparisons between mutant sequences and ability to prioritize mutants to analyze) 


Determine the depth of sequencing 


impinges on the accurate calling of variants (see limitations to deep sequencing for mutant 

identification) 


Choose an analysis platform 


(see analyzing deep sequencing data). The parameters and filtering stringency used will 
impact on the ability to make accurate variant calls 


Identify candidates for the causal mutation 


- start looking for mutations altering the coding sequence or splice donor/acceptor sites 

- filter out possible sequencing errors or variants from the N2 reference sequence 

- look at all variants in the mapped region (i.e., point mutations vs. indels, EMS-canonical vs. 

non-canonical changes) 

(see analyzing deep sequencing data and limitations to deep sequencing for mutant 

identification) 



comparing multiple mutants of the same genetic background 
in parallel, or in independent sequencing experiments, makes 
identifying candidate mutations a much simpler task as shared 
variants between the sequenced strains can be discounted. This 
is relatively straightforward when performing EMS-based map- 
ping, as mutants are moved toward an isogenic state through 
multiple rounds of backcrossing. Of course, in order not to 
discount a true causal mutation that happens to present itself 
as the exact same DNA lesion in two separate mutants (a rare 



occurrence), it is important to confirm that each variant being 
discounted occurs also in separate mutants that map to different 
regions of the genome. 

However, for SNP-based mapping, where sequence compari- 
son may be hindered between mutants of the same screen due to 
the introduction of stretches of CB4856 genomic DNA, the pre- 
mutagenized strain may be sequenced separately for comparison. 
In addition, subtraction of already known and published nucleo- 
tide variants in C. elegans will help to facilitate the identification 
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Table 3. Latest technology and costs of deep sequencing (adapted from 
Hayden2013") 



Technology 


Cost of machine 


Read 
length 


Cost per Mb 


Ion Torrent Proton 


$224,000 USD 


200 bp 


1-9 cents 


lllumina HiSeq 


$690,000 USD 


300 bp 


4-5 cents 


lllumina MiSeq 


$125,000 USD 


500 bp 


14-70 cents 


Ion Torrent PGM 


$49,000 USD 


400 bp 


60 cents-$5 USD 


PacBio RS 


$695,000 USD 


4,575 bp 


$2-17 USD 



While multiplexing and increased reads capacity per chip have enabled 
the sequencing costs to lower, the costs associated with the prepara- 
tion of the sequencing library have remained stable. The number of 
times the genome will be sequenced, called the sequencing depth or 
coverage, dictates the total cost. Coverage can be calculated as the read 
length x the number of reads x 1 or 2 depending on whether single or 
paired ends are sequenced, over the haploid genome length. Because 
reads are not distributed evenly over the genome, many bases will be 
covered by more reads than the average aimed for while others will 
be covered by fewer reads. In addition, certain regions are harder to 
sequence. In our experience, a coverage of 20X-30X is sufficient for ac- 
curate mutation identification. 

of putative phenotype-causing mutation [for example, see file 
Fig. SI, 12 Table SI, 15 WormBase (www.wormbase.org) 35 ]. 

Time involved. One drawback of backcrossing or outcross- 
ing multiple times compared with a single cross with CB4856 
to create a mapping population is the horizontal time required. 
Four or more crosses to the original un-mutagenized strain can 
take a month or more, although as said before, it is possible to 
make fewer backcrosses-for instance, just one backcross-and 
select and pool multiple F 2 recombinants for DNA sequencing 
(Fig. 4). In either case, any mapping strategy will not avoid the 
need to eventually backcross or outcross mutagenized animals in 
order to remove background mutations that could confound the 
interpretation of the impact of the primary mutation. In fact, it 
would be advisable under any circumstances, before investing in 
any directed efforts to sequence, to first backcross or outcross a 
mutant one or several times to determine if the mutation behaves 
in a Mendelian fashion, is a single locus, is linked to chromosome 
X and is recessive or dominant. 

Ability to prioritize mutants to analyze. If one wishes to first 
prioritize mutants from a screen to be sequenced based on their 
likelihood of affecting distinct loci, then generating a mapping 
population with CB4856 may provide an advantage. Although 
somewhat defeating the purpose of simultaneously mapping and 
sequencing a mutation, rough mapping (using traditional tech- 
niques) of the mapping populations prior to sequencing might 
hint that multiple mutants affect the same gene and, thus, only 
one of these could be sequenced. Caution should be used with 
such an approach since rough mapping is indeed a very coarse 
method to localize a mutation to an approximate chromosomal 
region containing many genes. Alternatively, complementation 
analysis of each mutant from a screen can be employed without 
the need for a mapping population, and is a much more accurate 
method to assess whether multiple mutants of the same screen 
affect the same gene. However, in light of the spiraling down- 
ward costs of deep sequencing (see next section), it may not 



be worthwhile at all to choose which mutants to sequence, but 
rather to sequence them all, saving both time, reagents and labor. 
Indeed, sequencing multiple mutant alleles of the same gene 
would be very useful for immediate identification of candidate 
mutations, and can be very informative for the function of the 
gene, advantages that outweigh redundancy concerns. 

Cost of Whole Genome Sequencing 

Currently, a mutant C. elegans genome can be sequenced to 20X 
fold coverage for less than $500 USD. Reducing prices can be 
attributed to advances in deep sequencing technology that allow 
a greater density of DNA clusters on the surface of each flow 
cell lane of a sequencing chip, as well as longer read lengths and 
analysis of multiple independent genomes on a single flow lane, 
multiplexing. 36 Over the past decade, the plummeting costs of 
DNA sequencing have exceeded Moore's law, which describes a 
long-term trend in the computer hardware industry of a doubling 
of power every 2 y. 37 Table 3 provides a summary of the latest 
sequencing platforms and their associated costs. 

Analyzing Deep Sequencing Data 

Analyzing deep sequencing results from C. elegans was aided by 
the development of programs such as MAQgene, 38 which uses 
the assembly building software MAQ. 39 Although once useful, 
MAQ is now an outdated aligner and installation of MAQgene 
required specialized bioinformatic expertise. More recently, a 
cloud-based pipeline called CloudMap was developed, which is 
entirely Internet-based and therefore does not require any soft- 
ware installation. 34 This latest development allows users to choose 
from several aligners including the Burrows-Wheeler Aligner 
(BWA) 40 and Bowtie. 41 With CloudMap, a specific map posi- 
tion and list of candidate variants from raw sequencing data from 
all major NGS platforms (lllumina, ABI, 454) can be obtained 
using either of the different mapping strategies discussed in this 
review (EMS-based and SNP-based mapping). Alternatively, 
SAMtools (Sequence Alignment/Map) 42 may be used for align- 
ment and nucleotide variant identification but does not incorpo- 
rate the automated workflow for SNP- or EMS-based mapping 
outputs that CloudMap does. Readers are referred to the primary 
papers of each of these alignment programs for optimization of 
parameters such as mapping quality and base quality thresholds 
as well as read depth filters to alter the stringency of nucleotide 
variant lists generated. Calling of bona fide deletions (see below) 
is particularly sensitive to the settings used, as deleted sections of 
DNA can be interpreted as uncovered regions of sequence. 

Once a mapping interval has been determined (a graphical 
output is presented by CloudMap), it is then up to the user to 
discriminate those mutations in the mapping region that are the 
strongest candidates to examine in priority. It is likely that the 
causal mutation will affect the coding sequence of a given gene 
(a description of which will be in the output list), generating a 
missense, nonsense or a splicing defect. However, it is entirely 
possible that mutations in regulatory regions upstream, down- 
stream or within introns may also cause the phenotype and, 
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therefore, should not be rashly discounted. It is also important 
to take into account that EMS does not only induce canonical 
G to A nucleotide transitions (C to T on the opposite strand), 
but also other DNA lesions including deletions. Although occur- 
ring at a far lesser frequency than point mutations (which are not 
only restricted to G > A changes 14 ), these EMS -induced deletions 
can be quite large (e.g., 1,888 bp in C. elegans). n Thus, when a 
mapping region has been determined, a thorough examination 
of all nucleotide transitions and tranversions, as well as an inves- 
tigation of potential deletions, which may be called as "uncov- 
ered regions" (depending on the aligning software and settings 
used), should take place in this region. True deletions or inser- 
tions can be confirmed as such by PCR and Sanger sequencing, 
and if coverage of a particular nucleotide variant is extremely low 
(e.g., < 3-fold) the same can be done for point mutations. Of 
course, if the same "uncovered region" occurs in an independent 
strain or in an independent mutant of the same background, it 
likely reflects either a difficult region to sequence or align (e.g., 
repetitive sequences) or a deletion initially present in the starting 
strain and is therefore not causal. Confirmation of a causal muta- 
tion can be achieved via phenocopy (using RNAi or available 
mutants), complementation analysis and transgenic rescue. 

Limitations to Deep Sequencing 
for Mutant Identification 

One potential limitation to the successful cloning of a mutant 
using deep sequencing arises if the causal DNA lesion is not cov- 
ered by the sequences obtained. However, the chances of this 
occurring are quite slim if enough depth in sequence is obtained. 
Flibotte et al. (2010) calculated that the overall sensitivity to 
detect point mutations was 89% when sequencing to a depth of 
20X (the average number of times each nucleotide was sequenced 
across the genome). This was improved to 95% when focusing 
only on exons, in which most causal mutations are likely to occur 
in. 14 Another potential problem relates to the potential of false 
negative or false positive miscalls of nucleotide variants, espe- 
cially of poorly covered or difficult to sequence genomic regions. 
The simultaneous mapping and deep sequencing strategies dis- 
cussed here are relatively robust to individual variant miscalls, as 
they each rely on the combined data of many variant calls (or lack 
thereof). However, when examining in detail the mapping region 
to identify candidate mutations, miscalling of nucleotide variants 
may prove crucial in identifying the true causal mutation. 

Phenotypes arising from gene duplication may also present a 
problem when trying to identify a genetic cause. This type of 
DNA lesion would be quite difficult to detect with deep sequenc- 
ing. The number of reads obtained for a given section of genome 
is very variable and, as such, the copy number of a gene would 
be very difficult to ascertain. Although, we have observed that 
integrated, multi-copy transgenes present as a greater number of 
sequence reads than neighboring genomic regions, a single dupli- 
cation event would be almost impossible to detect in this man- 
ner. It is possible however, that the individual sequence reads that 
cover each end of a duplicated gene will overlap with an entirely 
different region of genome than that of the genes' native position. 



This would hint at a possible duplication event as well as the loca- 
tion of the duplication within the genome. 

Finally, it may be that the causal mutation lies within a mis- 
annotated region (cDNA or exon not annotated) of the C. elegans 
genome. In such cases, we have found it very useful to verify the 
degree of conservation across different nematode genomes around 
putative mutations when these are not localized to a known ORF 
[for example, using the UCSC genome browser (genome. ucsc. 
edu/)]. 

Conclusion 

Maturation of next generation sequencing technology has enabled 
affordable and rapid whole genome sequencing. However, when 
using this technology for forward genetic screens, it is important 
to consider that identification of total nucleotide variants alone 
will not, in the majority of cases, allow a small enough list of 
candidate mutations to be practically tested. Instead, many back- 
ground variants across the genome-including those that affect 
coding sequences-will be revealed, potentially confounding the 
identification of the true causal mutation. For this reason, several 
methods have been developed to simultaneously deep sequence 
and map the causal mutation to a region of the genome, aiding 
the identification of the causal mutation. These strategies have 
dramatically shrunk the time needed to identify a causal muta- 
tion from a mutagenesis screen. As such, the steps involved in 
identifying a causal mutation (s) no longer limit forward genetic 
mutagenesis screens. Depending upon the particular screen to be 
performed and the characteristics and number of the resulting 
mutants, one or a combination of the alternative mapping strate- 
gies discussed here will be best suited to quickly and effectively 
identify the causal mutation and gene behind the phenotype. 
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